A Fine-grained Evaluation Framework for Machine Translation System Development
نویسنده
چکیده
Intelligibility and fidelity are the two key notions in machine translation system evaluation, but do not always provide enough information for system development. Detailed information about the type and number of errors of each type that a translation system makes is important for diagnosing the system, evaluating the translation approach, and allocating development resources. In this paper, we present a fine-grained machine translation evaluation framework that, in addition to the notions of intelligibility and fidelity, includes a typology of errors common in automatic translation, as well as several other properties of source and translated texts. The proposed framework is informative, sensitive, and relatively inexpensive to apply, to diagnose and quantify the types and likely sources of translation error. The proposed fine-grained framework has been used in two evaluation experiments on the LMT English-Spanish machine translation system, and has already suggested one important architectural improvement of the system.
منابع مشابه
A Framework for Diagnostic Evaluation of MT Based on Linguistic Checkpoints
This paper describes an approach to the diagnostic evaluation of machine translation (MT) based on linguistic checkpoints, which can provide valuable information both to the developers and to the end-users of MT systems. We present a flexible framework and a new tool, DELiC4MT, for fine-grained diagnostic MT evaluation which can be extended to any language pair and applied to any evaluation tar...
متن کاملFine-grained human evaluation of neural versus phrase-based machine translation
We compare three approaches to statistical machine translation (pure phrase-based, factored phrase-based and neural) by performing a fine-grained manual evaluation via error annotation of the systems’ outputs. The error types in our annotation are compliant with the multidimensional quality metrics (MQM), and the annotation is performed by two annotators. Inter-annotator agreement is high for s...
متن کاملFine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models
Title of dissertation: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models Yuval Marton, Doctor of Philosophy, 2009 Dissertation directed by: Professor Philip Resnik, Department of Linguistics and Institute for Advanced Computer Studies This dissertation focuses on effective combination of data-driven natural language processing (NLP) approaches with lingu...
متن کاملThe Hiero Machine Translation System: Extensions, Evaluation, and Analysis
Hierarchical organization is a well known property of language, and yet the notion of hierarchical structure has been largely absent from the best performing machine translation systems in recent community-wide evaluations. In this paper, we discuss a new hierarchical phrase-based statistical machine translation system (Chiang, 2005), presenting recent extensions to the original proposal, new e...
متن کاملContrastive Lexical Evaluation of Machine Translation
This paper advocates a complementary measure of translation performance that focuses on the constrastive ability of two or more systems or system versions to adequately translate source words. This is motivated by three main reasons : 1) existing automatic metrics sometimes do not show significant differences that can be revealed by fine-grained focussed human evaluation, 2) these metrics are b...
متن کامل